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ABSTRACT 

A  systolic  array  is  a  natural  architecture  for  the 
implementation  of  a  Reed- Solomon  (RS)  encoder  and  decoder. 
It  possesses  many  of  the  properties  desired  for  a  special- 
purpose  application:  simple  and  regular  design,  concurrency, 
modular  expansibility,  fast  response  time,  cost- effective- 
ness, and  high  reliability.  As  a  result,  it  is  very  well 
suited  for  the  simple  and  regular  design  essential  for  VLSI 
implementation . 

This  thesis  takes  a  modular  approach  to  the  design  of  a 
systolic  array  based  RS  encoder  and  decoder.  Initially,  the 
concept  of  systolic  arrays  is  discussed  followed  by  an 
introduction  to  finite  field  theory  and  Reed- Solomon  codes. 
Then  it  is  shown  how  RS  codes  can  be  encoded  and  decoded  with 
primitive  shift  registers  and  implemented  using  a  systolic 
architecture.  In  this  way,  the  reader  can  gain  valuable 
insight  and  comprehension  into  how  these  entities  are 
coalesced  together  to  produce  the  overall  implementation. 
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I.        INTRODUCTION 

In  this  very  volatile  and  technological  age,  it  is 
imperative  that  communication  links  and  computer  memories 
transmit  information  reliably  and  quickly.  However,  in  many 
cases  this  is  virtually  impossible  because  noise  causes  the 
received  data  to  differ  significantly  from  the  original 
data.  In  order  to  rectify  this  situation  error-correcting 
codes  have  been  developed  to  enable  a  system  to  continually 
maintain  a  high  degree  of  reliability  despite  the  presence 
of  noise.  To  accomplish  the  error  correction,  in  addition 
to  the  data  or  information  bits  that  are  transmitted,  some 
additional  redundant-check  bits  or  parity  bits  are  also 
transmitted.  In  this  way,  although  the  noise  may  introduce 
some  errors  in  either  the  transmitted  data  bits  or  the 
transmitted  check  bits,  there  are  usually  still  enough 
uncorrupted  bits  available  to  the  receiver  to  allow  a 
sophisticated  decoder  to  correct  the  errors.  In  fact,  only 
a  modest  amount  of  redundancy  is  actually  needed  to  ensure 
that  the  probability  of  the  decoding  error  is  negligibly 
small.     [Ref.    1] 

Nonetheless,  unlike  the  encoders  and  decoders  of  the 
1950 's  and  1960's  which  were  constrained  by  digital  hardware 
costs    and      virtually    nonexistent      chip      technology,      today's 
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encoders  and  decoders  coupled  with  significant  improvements 
in  their  associated  algorithms  have  become,  and  will  con- 
tinue to  be,  increasingly  attractive  from  an  economic 
viewpoint. 

One  such  class  of  error-correcting  codes  which  is  very 
popular  in  the  communication  circles  and  is  paramount  in 
this  author's  discussion  of  systolic  array  encoders  and 
decoders  are  the  Reed- Solomon  (RS)  codes.  These  codes  can 
correct  both  random  and  burst  errors  over  a  communication 
channel,  and  as  such  are  ideal  for  the  very  low  error  pro- 
babilities needed  for  reliable  space  communications.  Still, 
the  RS  codes  are  only  as  effective  as  the  complexity  of  the 
encoder  that  produces  them  and  the  decoder  by  which  errors 
are  corrected.  The  encoder  complexity  is  directly  propor- 
tional to  the  error- correcting  capability  of  the  code,  the 
speed  of  the  encoding  process,  and  the  interleaving  level 
used,  i.e.,  the  number  of  original  codewords  which  are 
multiplexed  together  to  increase  the  immunity  of  codes  to 
burst  errors  [Ref.  1],  In  fact,  for  truly  reliable  space 
communications  there  is  a  bonafide  need  to  use  RS  codes  with 
a  large  error-correcting  capability  and  an  equally  large 
interleaving      level.  As      a        result,      one      is      especially 

interested  in  decreasing  or  minimizing  the  complexity  of  an 
RS  encoder  while  simultaneously  ensuring  maximum  performance 
and  high  reliability.  Clearly,  what  is  needed  for  this  type 
of   application    is   a    special- purpose    system   which   compliments 
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the  for erne nt ion ed  attributes.  Therefore,  a  systolic  array 
is  a  natural  architecture  for  the  simple,  regular,  and  cost- 
effective    implementation  of   an    RS   encoder   and   decoder. 

In  an  effort  to  assist  the  reader  in  simplicity  and 
comprehension,  this  author  has  taken  the  pertinent  informa- 
tion vital  to  the  thesis  and  created  a  chapter  for  each. 
After  systolic  arrays  are  introduced  in  Chapter  II  the 
necessary  fundamentals  of  finite  fields  for  an  understanding 
of  Reed-Solomom  codes  is  discussed  in  Chapters  III  and  IV. 
In  Chapter  V  a  systolic  array  multiplier  for  finite  fields 
is  discussed  and  finally  in  Chapter  VI  the  encoder  and 
decoder  for  binary  codes  is  described  as  well  as  the  encoder 
and    decoder    for    RS   codes. 
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II.   SYSTOLIC  ARRAYS 

A.   BACKGROUND 

It  is  clear  today  that  developments  in  microelectronics 
have  made  a  revolutionary  impact  on  computer  design 
[Ref.  2].  For  example,  integrated  circuit  technology  has 
made  a  significant  increase  in  the  number  and  complexity  of 
components  that  can  now  fit  on  a  chip  or  a  printed  circuit 
board.  In  fact,  with  the  component  density  presently 
doubling  every  one- to- two  years,  the  notion  of  the  million- 
transistor  chip  will  soon  be  a  reality  [Ref.  3].  Commen- 
surate with  this  major  increase  in  chip  density  is  the 
utilization  of  highly  parallel  computing  structures  which, 
almost  by  definition,  implies  a  basic  computational  element 
repeated  hundreds  or  thousands  of  times.  This  architectural 
style,  which  has  structural  properties  suitable  for  VLSI 
implementation,  reduces  the  design  problem  by  several  orders 
of  magnitude.  As  a  result,  we  are  interested  in  high- 
performance  parallel  structures  that  can  be  implemented 
directly  via  very  economical  hardware  devices  [Ref.  2].  In 
other  words,  cost-effectiveness  has  always  been,  and  will 
continue  to  be,  a  major  concern  in  designing  special- purpose 
VLSI  systems;  their  cost  must  be  low  enough  to  justify  their 
limited  applicability.  Furthermore,  if  a  structure  can 
truly  be   decomposed   into  a   few  types   of   building  blocks 
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which  are  used  repetitively  with  simple  interfaces,  tremen- 
dous savings  can  be  achieved. 

This  is  especially  true  for  VLSI  designs  where  a  single 
chip  usually  comprises  hundreds  of  thousands  of  identical 
components.  Clearly,  in  order  to  overcome  this 
complexity,  simple  and  regular  designs  are  essential.  In 
fact,  VLSI  systems  which  are  based  on  simple,  regular  lay- 
outs are  very  likely  to  be  modular  and  therefore  adjustable 
to  various  performance  levels.  Still,  with  the  technological 
indication  of  a  diminishing  growth  rate  for  component  speed, 
any  major  improvement  in  computation  speed  must  come  from 
the  concurrent  use  of  many  processing  elements.   [Ref.  3] 

The  degree  of  concurrency  in  a  VLSI  computing  structure 
is  largely  determined  by  the  underlying  algorithm. 
Consequently,  massive  parallelism  can  be  achieved  if  the 
algorithm  is  designed  to  exploit  high  degrees  of  pipelining 
and  multiprocessing.  For  instance,  when  a  large  number  of 
processing  elements  work  simultaneously,  coordination  and 
communication  become  significant — especially  with  VLSI  tech- 
nology where  routing  costs  dominate  the  power,  time,  and 
area  required  to  implement  a  computation.  Thus,  the 
requirement  is  to  design  algorithms  that  support  high 
degrees  of  concurrency,  and  at  the  same  time  to  employ  only 
simple,  regular  communication  and  control  to  ensure  effi- 
cient implementation.   [Ref.  4] 
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Clearly,  what  is  required  is  a  special-purpose  design 
which  employs  simple  and  regular  communication  paths  for 
multiprocessor  structures  in  addition  to  pipelining  as  a 
general  method  for  utilizing  these  structures.  In  short, 
systolic  arrays  provide  a  realistic  model  of  computation 
which  captures  these  concepts  of  pipelining,  parallelism, 
and  interconnection  structures. 

According  to  Kung  and  Leiserson  [Ref.  2]: 

A  systolic  array  is  a  collection  of  relatively  simple 
processing  units,  usually  of  the  same  type,  which  are 
connected  together  by  a  simple  communication  network  and 
that  operate  in  parallel,  as  depicted  in  Figure  2.1. 
The  performance  advantage  of  a  systolic  array  architec- 
ture is  that  it  uses  each  datum  retrieved  from  memory 
numerous  times  without  having  to  store  and  retrieve 
intermediate  results,  thus  allowing  significant  speedups 
relative  to  the  memory  bandwidth.  Thus,  a  systolic 
system  is  a  network  of  processors  which  rhythmically 
computes  and  passes  data  through  the  system.  The 
analogy  is  to  the  rhythmic  contraction  of  the  heart 
which  pulses  blood  through  the  circulatory  system  of  the 
body.  Each  processor  in  a  systolic  network  can  be 
thought  of  as  an  element  through  which  multiple  streams 
of  data  are  pumped.  The  regular  beating  of  these 
parallel  processors  maintains  a  constant  flow  of  data 
throughout  the  entire  network.  As  data  items  are  pumped 
through  the  network  some  constant-time  computation  is 
performed  and,  depending  on  the  operation,  updates  of 
some  of  the  items  may  occur.  However,  unlike  the 
closed-loop  circulatory  system  of  the  body,  a  systolic 
computing  system  usually  has  ports  into  which  inputs 
flow,  and  ports  from  which  the  results  of  the  computa- 
tion are  received.  Thus,  a  systolic  system  can  be 
viewed  as  a  pipelined  system — one  in  which  input  and 
output  occur  with  every  pulsation. 

As  a  result,   this   makes  it  extremely  attractive  for   a 

wide   class   of  compute-bound   computations   where   multiple 

operations  are  performed   on  each  data  item  in   a  repetitive 

manner . 
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(a).  ONE -DIMENSIONAL  LINEAR  ARRAY 


(c)  TWO-DIMENSIONAL  SQUARE  ARRAY 


(b)  TRIANGULAR  ARRAY 


(d)  TWO-DIMENSIONAL  HEXAGONAL  ARRAY 
Figure  2.1   Various  Systolic  Array  Configurations 
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B.   PRINCIPLE  OF  OPERATION 

The  basic   principle  of  a  systolic  array   is  illustrated 

in  Figure   2.2.    As  stated   earlier,  by   replacing  a  single 

processing    element   (PE)   with   an   array   of   processing 

elements,   a  higher   computation  throughput  can   be  achieved 

without  increasing  the  memory  bandwidth. 

Suppose  each  processing  element  in  Figure  2.2  operates 
with  a  clock  period  of  100  nanoseconds  (ns).  The  con- 
ventional memory-processor  organization  in  Figure  2.2a 
has  at  most  a  performance  of  5  million  operations  per 
second  (MOPS).  With  the  same  clock  rate,  the  systolic 
array  processor  will  result  in  30  MOPS  performance. 
This  gain  in  processing  speed  can  also  be  justified  with 
the  fact  that  the  number  of  pipeline  stages  has  been 
increased  six  times  in  Figure  2.2b.  Being  able  to  use 
each  input  data  item  a  number  of  times  is  just  one  of 
the  many  advantages  of  the  systolic  approach.  Other 
advantages  include  modular  expansibility,  simple  and 
regular  data  and  control  flows,  use  of  simple  and 
uniform  cells,  elimination  of  global  broadcasting, 
limited  fan-in  and  fast  response  time.  [Ref.  3] 

With  the  above   criteria  a   systolic  array  is   a  natural 

architecture  for   the  implementation   of  an   RS   encoder  and 

decoder  which  will  become  apparent  after  our  introduction  of 

Reed-Solomon  codes  in  Chapter  IV. 
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(b)  A  SYSTOLIC  PROCESSOR  ARRAY 


Figure  2.2   The  Concept  of  a  Systolic  Processor  Array 


III.   FINITE  FIELD  THEORY 

A.   BACKGROUND 

Finite  or  Galois  fields  (named  after  the  nineteenth 
century  French  mathematician  Evariste  Galois)  play  many 
important  and  diverse  roles  in  numerous  applications  ranging 
from  digital  signal  processing  to  switching  theory.  How- 
ever, in  this  thesis  we  are  concerned  with  their  use  in  the 
construction  of  Reed- Solomon  error- correcting  codes.  We 
begin  with  a  general  analysis  of  the  pertinent  facts 
regarding  finite  fields.  In  the  next  chapter  the  necessary 
facts  about  Reed-Solomon  codes  are  discussed. 

A  field  is  a  set  of  elements,  including  0  and  1,  any 
pair  of  which  may  be  added  or  multiplied  (denoted  by  +  and 
*,  respectively)  to  give  a  unique  result  in  the  field.  The 
addition  and  multiplication  are  associative  and  commutative, 
and  multiplication  distributes  over  addition  in  the  usual 
way:  u* ( v+w)=u*v+u*w.  Every  field  element  u  has  a  unique 
negative  -u  such  that  u+(-u)=0.  Every  nonzero  field  element 
u  has  a  unique  reciprocal  field  element  1/u,  such  that 
u*(l/u)=l.  For  every  field  element  u,  0+u=u=l*u,  and  0*u=0. 
Thus  the  numbers  0  and  1  are  the  additive  and  multiplicative 
identities,  respectively.  [Ref.  5] 

The  order  of  a  field  is  the  number  of  elements  in  the 
field.    If  the   order  is   infinite,  we   call   the  field   an 
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infinite  field.  The  rational  numbers,  the  real  numbers,  and 
the  complex  numbers  are  all  examples  of  infinite  fields.  If 
the  number  of  elements  is  finite  we  call  the  field  a  finite 
field  [Ref .  5] . 

For  any  prime  p  and  any  positive  integer  m  a  Galois 
field  denoted  GFfp™)  or  GF(q)  exists.  We  can  construct  a 
field  containing  p^  elements  as  an  algebra  of  polynomials 
modulo  an  irreducible  polynomial  over  GF(p)  of  degree  m. 
Addition  is  bit- by-bit  modulo  p  addition. 

The  multiplicative  group  of  the  nonzero  field  elements 
is  cyclic,  i.e.,  it  is  a  group  that  consists  of  all  the 
powers  of  one  of  its  elements,  3.  Multiplication  is  defined 
as  3^*  3 J  =  3^-+ D  where  i+j  is  computed  modulo  (p111-!)  and  g  is  a 
generator  of  this  group.  A  generator  of  this  multiplicative 
group,  called  a  primitive  element,  is  a  root  of  an  irre- 
ducible polynomial  over  the  prime  field  GF(p).  This 
irreducible  polynomial,  called  a  primitive  polynomial,  is 
the  minimal  polynomial  of  the  primitive  element,  i.e.,  the 
polynomial  of  least  degree  with  the  primitive  element  3  as  a 
root.  Generally  speaking,  an  irreducible  polynomial  is 
analogous  to  a  prime  number:  it  has  no  nontrivial  factors. 
Lastly,  the  Galois  fields  that  can  be  created  by  taking 
residue  or  equivalence  classes  of  polynomials  modulo  an 
irreducible  polynomial  over  GF(p)  are  said  to  be  fields  of 
characteristic  p.  Thus,  GF(pm)  is  a  field  of  characteristic 
p  for  each  choice  of  positive  integer  m  [Ref.  6] . 
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B.       AN    EXAMPLE    OF    THE    CREATION    OF    A    FIELD 

Consider  the  Galois  field  GF(24).  It  has  24  elements 
and  may  be  constructed  as  the  field  of  polynomials  over 
GF(2)  modulo  the  irreducible  polynomial  1+x+x4  .  If  we  let  g 
represent  a  root  of  1+x+x4,  then  it  is  also  a  primitive 
element  of  the  field.  Field  addition  of  the  elements  is 
bit-by-bit  modulo  2  addition  while  multiplication  of  the 
elements  is  described  using  the  primitivity  of  the  element 
3.  Thus,  $i*  $3  =  $i+i  where  i+j  is  reduced  modulo  15/  if 
necessary.  For  example,  given  the  field  elements  pi 3  and 
$1  two  of  the  15  nonzero  field  elements  listed  in 
Table  I,  we  can  easily  demonstrate  both  operations: 
3l3+39  =  (  g3+ 32+1  )+(  33+3)=2  g3+ g2+ 3+1=32+ 3+i=  3IO  while  gl3*39  = 
313+9  =  322  =  322-15  =  37  =  33+3+1. 
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TABLE    I 


REPRESENTATION    OF    GF ( 2    ) 


FIELD 
ELEMENT 

BETA    POLYNOMIAL 

4-TUPLE 

B°-l 

1 

10    0    0 

61=B 

6 

0    10    0 

62=B2 

e2 

0    0    10 

3       3 
6  =B 

B3 

0    00    1 

64=6+l 

1+6 

110    0 

5            4 
6   =6(6    ) 

6  + 

B2 

0    110 

6             5 

3  =B(3  ) 

s2  ♦ 

B3 

0    0    11 

e7=e (36) 

1+6 

+ 

63 

110    1 

Z8=Z(Z7) 

1   + 

B2 

10    10 

$9=&($8) 

6 

+ 

B3 

0    10    1 

$10=&{$9) 

1   +    6   + 

B2 

1110 

Bl:L  =  B(610) 

6-  + 

s2  ♦ 

B3 

0    111 

12           11 

1   +    6  + 

B2   + 

B3 

1111 

13            12 

1            + 

B2   ♦ 

B3 

10    11 

14             13 

6    =e (b    ) 

1 

+ 

B3 

10    0    1 
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IV.       REED- SOLOMON    CODES 

A.       BACKGROUND 

Reed- Solomon  (RS)  codes  are  Bose-Chaudhur  i-Hocquenghem 
(BCH)  codes  over  GF(q)  of  length  q-1.  They  are  error- 
correcting  codes  which  are  used  in  many  special- purpose 
applications  ranging  from  deep- space  communications  and 
spread  spectrum  to  digital  audio  disk  systems  and  secure 
data  transmissions  [Ref.  7].  These  codes  can  correct  both 
random  and  burst  errors  over  a  communication  channel  and 
hence  are  ideal  for  the  numerous,  real-time,  and  reliable 
applications  demanded  by  these  applications.  The  complexity 
of  RS  encoders  and  decoders  is  proportional  to  the  error- 
correcting  capability  of  the  code,  the  speed  of  the  decod- 
ing, and  the  interleaving  depth  used  [Ref.  8].  For  truly 
reliable  communications  there  is  a  very  strong  tendency  to 
use  RS  codes  with  a  large  error-correcting  capability  and  an 
equally  large  interleaving  level.  Hence,  one  is  especially 
interested  in  minimizing  the  complexity  of  RS  encoders  and 
decoders  for  communications  and  other  pertinent  applica- 
tions. Toward  this  end,  there  is  a  considerable  interest  in 
systolic  array  construction  and  eventual  VLSI  implementation 
of  RS  encoders  and  decoders  which  yield  significant  savings 
in  size,  weight,  and  power  consumption  while  simultaneously 
providing    high   reliability. 
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In  this  chapter  we  look  at  a  generic  construction  and 
architecture  of  an  RS  encoder  developed  by  Johl  [Ref.  9] 
and  use  this  design  as  a  foundation  for  the  subsequent 
discussion  and  implementation  in  the  later  chapters.  This 
implementation  utilizes  a  systolic  architecture  of  identical 
cells  arranged  in  a  linear  array,  .each  executing  a  finite- 
field  multiplication  and  addition  in  a  pipelined  manner; 
thereby,  significantly  increasing  the  throughput  rate. 
Also,  since  the  layout  of  the  cell  need  only  be  done  once 
and  then  replicated,  it  is  extremely  attractive  for  eventual 
VLSI  implementation. 

B.   GENERIC  ARCHITECTURE 

The  RS  code  is  a  block  code  which  consists  of  symbols  of 
more  than  one  bit.  When  each  symbol  is  J-bits  wide,  an  RS 
codeword  has  (2J-1)  symbols.  As  depicted  in  Figure  4.1,  an 
RS  code  can  be  designed  to  be  capable  of  correcting  E  errors 
with  each  codeword  consisting  of  I  information  symbols, 
together  with  2E  parity  or  check  symbols.  As  an  example, 
given  the  irreducible  polynomial  1+3+3^=0  and  its  corre- 
sponding finite  field  as  described  in  Table  I  we  are  able  to 
establish  an  important  foundation  vital  to  the  development 
of  a  generic  RS  encoder.  This  RS  code  consists  of  a  total  of 
15-four  bit  symbols  for  each  codeword.  If  this  particular 
code  should  correct  one  error,  it  would  need  two  parity 
symbols  and   therefore  would   contain   thirteen   information 
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symbols.  This  representation  is  known  as  an  RS  (15,13) 
code,  where  the  first  integer  depicts  the  total  number  of 
symbols  in  the  codeword,  and  the  second  integer  indicates 
the  number  of  information  symbols.  It  is  the  responsibility 
of  the  encoder  to  use  the  information  symbols  to  generate 
the  check  or  parity  symbols  for  the  codeword.  The  informa- 
tion symbols  are  treated  as  coefficients  of  a  polynomial 
f(x), 

2J-i-2E 
f(x)  = 


i=l 


2J-l-i 


where  f^  is   the  itn   transmitted  information   symbol.    The 
corresponding  generator  polynomial  is  known  as  g(x). 


2E 


g(x)  = 


(x+B1) 


i  =  l 


Then,  the  2E  parity  symbols  are  defined  as  the  coefficients 
of  the  remainder  of  f(x)/g(x).  Therefore,  in  the  RS  (15,13) 
code  previously  mentioned 


g(x)  = 


i=0. 


(x+eM 


(x+61)(x+32) 
x2+(  31+62)x+33 
x2+B5x+03 
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Furthermore,  let  us  assume  that  the  thirteen  information 
symbols  are  36  /  31  /  S8 ,  S2  /  34  35  /  312/  S7 1  39  ,  S11/  B14  / 
33 ,    313  .      Then, 


13 


f.x15'1 


1 


i=l 


=    f1x14+f2x13+f3X12f4X11+f5X10  +  f6x5  +  f7x8+fgx7+f9x6+f10x5 

+  fllx4+f12x3  +  f13x2 
=    36x14+31x13+S8x12+e2x11+e4x10+s5x9+e12x8+  g7x7  +  39x6  +  Bllx5 

+  314x4+33x3+e13x2 
Performing    the   required   division,    f(x)/g(x) 


3x      +3x      +3x      +...+  3      x+3x+0 


2^.5    io3L6    14^    1    13^  08    12^         L    14    4^    3    3^  „13    2 

x+3x+3l3x      +  3   x      +  3   x      +...+  3      x   +  3   x   +  3      x 

6    14^    11    13      9    12 
3x      +3      x      +  3    x 

6    13       12    12^    2    11 
3x      +3      x      +3x 

06    13^,11    12^    9    11 
3x      +3      x      +  3   x 

o  iTTJl  lr  a  10 

3x      +3      x      +3x 

0    12  L    5    11         3    10 
3x      +3x        +3x 


,14    4X    14    3      13    2 
I      x   +  3      x   +  3      x 

,14    4^  n4    3    L    2    2 
?      x   +  3   x      +  3   x 

9    3^    14    2^n 
3   x      +3      x   +0  x 

09    3    L    14    2^    12 

3x      +3      x+3      x 

«12 

3      x 

0       x 


312x+0 
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Hence,  the  remainder  we  seek  is  B^,  0  and  thus  the  corre- 
sponding 15- symbol  codeword  is  36  e1  $8  b2  34  35  312  S7  39  611  S14  03 
3I3 gl2o  where  the  first  thirteen  symbols  represent  the 
information  symbols  and  the  last  two  symbols  represent  the 
parity    symbols.       [Ref.    9] 

The  architecture  of  the  systolic  implementation  consists 
of  a  regular  array  of  identical  cells.  Division  is  per- 
formed in  a  pipelined  manner  by  simultaneously  entering  the 
highest  order  of  terms  of  the  f(x)  and  g(x)  polynomials  on 
the  left  most  cell  and  generating  the  appropriate  codeword 
on  the  far  right,  as  depicted  in  Figure  4.2.  In  fact,  a 
codeword  can  immediately  follow  the  previous  one  without  any 
interruption  in  the  pipeline  flow.  Likewise,  the  control  is 
also  systolic.  One  control  bit  pipeline  path  will  signal 
the  start  of  a  new  codeword;  another  will  signal  the  start 
of  the  division  operation.  Meanwhile,  each  cell  of  the 
array  will  hold  one  term  of  the  quotient.  As  a  result,  if  d 
represents  the  difference  in  degrees  between  two  poly- 
nomials,   then 

d=[deg    f(x)-deg    g(x)] 

and  thus  d+1  cells  are  required.   For  example, 

deg  f(x)  =  14 
deg  g( x )  =2 

d  =  12  (degree  of  quotient) 
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From  our  previous  calculation,  the  quotient  was  (  $x±2+  $6x11 
+  gOx10+ . .  #  +  gl4x2+  39x+o  )  .  Since  it  consists  of  thirteen 
terms,    thirteen  cells   would    be    needed.      In  general, 

deg    f(x)    =   2^-2 
deg   g(x)    =   2E 
d    =   2J-2E-2 

and    so    the    total    number   of   cells  required    is  d+1    or    2J-2E-1. 
[Ref.    9] 

The  operation  of  each  cell  is  simple  and  regular. 
Essentially,  it  accomplishes  one  line  of  the  normal  division 
by  initially  determining  the  specific  term  of  the  quotient, 
multiplying  by  the  divisor,  subtracting  the  result  from  the 
dividend,  and  finally  passing  along  the  divisor  and  partial 
result  to  the  next  cell.  More  specifically,  there  are  three 
J-bit  data  paths  and  two  1-bit  control  paths,  as  shown  in 
Figure  4.3.  The  function  of  the  C  data  path  is  to  allow  the 
information  symbols  to  pass  through  the  array  unchanged 
while  the  other  two  data  paths,  A  and  B,  are  for  the 
dividend  and  divisor,  respectively.  The  register  Q  is  set 
at  the  start  of  the  division,  and  remains  the  same  through- 
out the  polynomial  division  of  one  block.  The  register  B  is 
used  as  a  temporary  storage  device.  While  a  control  bit 
accompanies  the  first  byte  of  information  to  signal  the 
start  of  a  new  codeword  a  preceding  start  bit,  one-half  the 
rate   of    the    control    bit,    initiates    the   division   operation    in 
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in 


B  . 
in 


C  .. 
in 


CONTROL  . 


START 


in 


in > 


out 


out 


>  c  out 


CONTROL 


out 


^  START 

-^  out 


A:   USED  FOR  DIVIDEND 

B:   USED  FOR  DIVISOR 

C:   USED  FOR  INFORMATION  SYMBOLS 

Q:   DIVISION  REGISTER 

S:   START  REGISTER 

CONTROL:   USED  TO  START  DIVISION 

t   FINITE  FIELD  MULTIPLIER 
FINITE  FIELD  ADDER 


Figure  4.3   The  Systolic  Cell  Structure 
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each  cell.  In  short,  the  above  architecture  is  simply  a 
pipelined  parallel  processor  which  is  composed  of  a  systolic 
array  of  identical  cells,  each  performing  a  finite-field 
multiplication  and  addition.  Since  the  layout  is  simple  and 
regular,  it  is  easily  replicated  and  economical  to  produce. 
[Ref.  9] 

In  Chapter  VI  the  encoder  and  decoder  for  an  RS  code  are 
described  in  greater  detail  with  the  encoding  and  decoding 
process  carried  out  for  a  specific  example. 
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V.   IMPLEMENTATION  THEORY 

A.  BACKGROUND 

In  this  chapter  we  look  at  the  theoretical  concepts 
behind  the  systolic  implementation  of  an  encoder  and 
decoder.  We  then  apply  these  concepts  to  the  actual  imple- 
mentation in  the  subsequent  chapter.  There,  the  binary  case 
is  initially  presented  because  of  its  simple  architecture 
and  ease  of  understanding.  It  is  then  followed  by  the  more 
intricate   and    complex    Reed- Solomon   case. 

We  also,  in  this  chapter,  discuss  in-depth  the  design  of 
a  systolic  array  multiplier  used  in  the  RS  encoder.  Unlike 
the  binary  case  which  deals  only  with  the  elements  0  and  1 
in  the  complete  codeword,  the  Reed- Solomon  codeword  will 
contain  symbols  which  lie  in  a  larger  field  than  GF(2).  As 
a  result,  the  systolic  array  multiplier  is  increasingly  more 
detailed  and  complicated  than  in  the  binary  design  which 
simply   uses   a   primitive   binary    shift   register    scheme. 

B.  PRIMITIVE  BINARY  SHIFT  REGISTER  DESIGN 

A  primitive  binary  shift  register  is  a  series  of  regis- 
ters each  capable  of  containing  a  zero  or  a  one.  The 
contents  of  the  register  all  shift  on  a  designated  time 
signal  via  use  of  an  external  clock.  The  contents  of  the 
newest  stage  of  the  register  is  defined  as  a  function  of  the 
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current  contents  of  the  register.  Because  these  shift 
registers  utilize  this  feedback  property  they  are  commonly 
referred  to  as  feedback  shift  registers  or  primitive  shift 
registers  since  the  feedback  is  usually  described  by  a 
primitive  polynomial  [Ref.  10].  For  example,  the  diagram  in 
Figure  5.1  describes  a  primitive  shift  register  composed  of 
four  registers,  labeled  1,  x,  x^ ,  x^  and  one  modulo  2  adder 
situated  between  registers  1  and  x.  Each  register  is 
capable  of  storing  one  bit  of  binary  information,  i.e.,  a 
"1"  or  a  "0".  The  all  zero  contents  of  the  register  is 
typically  prohibited.  This  restriction  is  placed  on  the 
primitive  shift  register  to  ensure  a  change  of  state  when  a 
new  clock  signal  is  received.  The  register  is  allowed  to 
step  from  state  to  state,  therefore  the  length  of  a  primi- 
tive cycle  is  independent  of  its  initial  state  and  is  equal 
to  2in-l  .  The  primitive  shift  register  of  Figure  5.1  will 
move  through  15  distinct  binary  patterns  before  repeating 
(see  Table  II).  This  primitive  shift  register  is  said  to 
have  a  cycle  length  of  2^-1  or  15.  Moreover,  since  all 
nonzero  patterns  are  included  in  the  cycle,  it  is  called  a 
maximum- length  cycle.  In  general,  a  primitive  shift  regis- 
ter composed  of  m  stages  will  generate  a  maximum- length 
cycle  of  period  2m-l.  It  is  possible  for  each  value  of  m  to 
determine  a  primitive  feedback  function  for  the  shift 
register  so  that  a  maximum- length  shift  register  sequence  of 
period  2m-l  is  generated. 
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TABLE  II 
REGISTER  CONTENTS  AFTER  SUCCESSIVE  CLOCK  SIGNALS 

REGISTER  CONTENTS  BETA  POLYNOMIAL 

10    0    0  1 

0    10    0  B 

0    0    10  B2 

0    0    0    1  B3 

110    0  B4 

0    110  B5 

0    0    11  S6 

110    1  B7 

10    10  B8 

0    10    1  B9 

1110  B10 
0    111  B11 

1111  B12 
10  11  B13 
10  0  1  B14 
10    0    0  B15 
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Maximum- length  cycles  and  maximum- length  sequences  have 
broad  applications  in  data  communication  systems  and  com- 
puter simulation  while  primitive  shift  registers  designed  as 
division  circuits  have  applications  in  coding  theory 
[Ref.  10].  It  is  the  objective  of  this  chapter  to  utilize 
the  concepts  of  the  latter  to  propose  an  RS  encoder  and 
decoder . 

In  order  to  generate  a  maximum- length  cycle  or  sequence 
we  need  to  understand  the  necessary  component  connections 
given  a  primitive  polynomial.  That  is,  given  an  arbitrary 
primitive  polynomial,  how  do  we  design  the  shift  register? 
For  the  example  of  Figure  5.1,  assume  p(x)=l+x+x4  is  a 
primitive  polynomial  over  GF(2).  We  can  consider  GF(2^)  as 
an  algebra  of  polynomials  modulo  p(x)=l+x+x^  and  design  a 
register  to  produce  a  pattern  cycle  of  length  2^-1.  Using 
four  delay  units  (since  we  need  a  register  unit  for  the 
coefficient  of  each  term  xfc  with  0<t<3)  we  need  only  decide 
how  the  primitive  polynomial  affects  the  feedback  to  know 
where  to  place  the  modulo  2  adder  components  and  where  to 
make  the  necessary  circuit  connections.  The  feedback  is  the 
coefficient  of  x^ ,  but  in  this  polynomial  algebra  x^=l+x. 
Thus,  the  feedback  goes  to  the  registers  which  contain  the 
coefficients  of  the  x^  and  x^  terms.  Making  these  connec- 
tions and  supplying  the  modulo  2  adder  component  where  we 
have  two  inputs  to  the  register,  we  arrive  at  the  shift 
register   given   in  Figure   5.1.    Then   each   step   of   the 
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register  is  equivalent  to  multiplying  the  contents  of  the 
register  by  the  primitive  element  g.  Thus,  the  sequence  of 
contents  are  the  powers  of  g  modulo  (l+g+g4).  in  this  way 
multiplication  of  the  elements  of  the  field  is  produced 
simply  as  described  in  Chapter  III  and  the  powers  of  g  are 
as  given  as  in  Table  I  of  Chapter  III. 

C.   CODING  THEORY 

Suppose  that  we  wish  to  transmit  a  sequence  of  binary 
digits  across  a  noisy  channel.  If  we  send  a  one  a  one  will 
probably  be  received  and  if  we  send  a  zero  a  zero  will  also 
probably  be  received.  Occasionally,  the  channel  noise  will 
cause  a  transmitted  one  to  be  received  as  a  zero  or  a  trans- 
mitted zero  to  be  received  as  a  one.  Although  we  are  unable 
to  prevent  the  channel  from  generating  such  errors,  we  can 
reduce  their  undesirable  effects  with  the  use  of  coding 
[Ref.  5].  The  basic  idea  is  simple.  A  set  of  k  message 
digits  which  we  wish  to  transmit  is  concatenated  to  r  check 
digits.  The  entire  block  of  n=k+r  channel  digits  then  forms 
the  transmitted  codeword.  Assuming  that  the  channel  noise 
changes  sufficiently  few  of  these  n  transmitted  channel 
digits,  the  redundancy  afforded  by  r  check  digits  provides 
the  receiver  with  sufficient  information  to  detect  and 
correct  the  channel  errors.  Figure  5.2  illustrates  the 
basic  idea  of  the  encoding  process  for  an  (n,k)  encoder  with 
n=15  and  k=ll.    The  codeword  is   constructed  in   such  a  way 
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that  the  message  digits  appear  at  the  far  right.  The  error 
correcting  capability  of  the  generated  code  depends  upon  the 
number  of  check  bits  added.  To  illustrate,  the  binary  code 
constructed  using  the  encoder  of  Figure  5 .2  is  capable  of 
correcting  one  error  when,  for  example,  n=2m-l,  k=2m-l-m  for 
each  integer  m  >  2,  the  so  called  Hamming  single-error- 
correcting   code. 

D.   MINIMAL  POLYNOMIALS 

In  order  for  a  code  to  correct  every  pattern  of  t  or 
fewer  channel  errors,  the  codewords  must  be  generated  by  a 
polynomial  whose  length  is  the  product  of  at  least  t 
distinct  minimal  polynomials  [Ref.  5].  Occasionally,  extra 
error  correcting  capability  is  possessed  by  words  of 
a  code  beyond  the  designed  capacity  '  of  the  code.  To 
understand  this  situation  and  the  general  error  correcting 
capacity  of  the  code,  it  is  necessary  that  we  discuss 
some  of  the  mathematical  concepts  and  properties  that 
comprise  minimal  polynomials  before  discussing  the  actual 
implementation . 

A  minimal  polynomial  for  a  primitive  element  3  over 
GF(p)  is  the  lowest  degree  irreducible  monic  (has  leading 
coefficient  1)  polynomial  M(x)  with  coefficients  from  GF(p) 
such  that  M(3)=0  [Ref.  11].  For  example,  the  Galois  field 
GF(24)  is  constructed  using  the  primitive  element  3,  the 
root  of  the  irreducible  polynomial  1+x+x4 .   Then  the  minimal 
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polynomials  for  the  elements  3,  g3  r  g5  f  and  g7  are  given  in 
the  following  table. 

Element  Minimal  Polynomial 

0  x 

1  1+x 
3                                             1+x+x4 

33  l+x+x2+x3  +  x4 

35  1+x+x2 

37  l+x3  +  x4 

Furthermore,  in  GF(2m)  3*  and  32*  have  the  same  minimal 
polynomial.  In  general,  if  31  is  a  root  of  a  minimal  poly- 
nomial then  so  is  3P1  (where  p  is  the  characteristic  of 
the  ground  field  GF(p);  in  this  case  p=2 ) .  To  illustrate, 
let  us  substitute  the  elements  3  and  32  into  our  minimal 
polynomial  1+x+x4.  Upon  substitution  of  3  we  obtain  1+ 3+ g4  . 
Thus  in  GF(24)  34=3+l  and  M( 3)=0 .  Likewise,  upon  sub- 
stituting 32  for  x  in  the  same  minimal  polynomial  we  obtain 
1+ 32+ 38 ,  which  in  GF(24)  is  also  zero  as  can  be  seen  in 
Table  I  of  Chapter  III.  Elements  of  the  field  with  the  same 
minimal  polynomial  are  called  conjugates.  In  the  same  way, 
the  imaginary  roots  i  and  -i  are  referred  to  as  conjugate 
complex  numbers — they  both  have  the  same  minimal  polynomial 
x2+l    over    the   reals    [Ref.    11]. 

From  our    preceding      discussion,    it    is   clear    that       3,     32 , 
(32)2  =  34,       (34)2  =  38    an      have    the      same      minimal      polynomial 
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l+x+x4  .  Likewise  $3  t  36  r  3I2  f  g24=^9  also  have  the  same 
minimal  polynomial  l+x+x^+x^+x^ .  We  see  that  the  powers  of 
3  fall  into  disjoint  sets,  called  cyclotomic  cosets.  In 
fact,  all  3J  which  are  elements  of  the  same  cyclotomic  coset 
have  the  same  minimal  polynomial.  The  cyclotomic  coset 
containing       3s      consists      of      the      following      powers      of       3: 


2  3  ms"1 

\  S  f      A  S  /      Z      S/      Z.      Sf       •••/      Z  S  j 


m 


where  ms  is  the  smallest  positive  integer  such  that  2  s  e 
s(mod  2m-l )  [Ref.  11].  For  example,  the  cyclotomic  cosets 
over    GF(24)    are: 


c0 

= 

{0} 

Cl 

= 

{1,2,4,8} 

C3 

= 

{3,6,12,9} 

C5 

= 

{5,10  } 

c7 

= 

{7,14,13,11} 

Other    cyclotomic    coset   decompositions    for    various   values   of 
m   are    listed    in    Table    III. 

If  we  let  M(i)(x)  represent  the  minimal  polynomial  of 
3;i-eGF(pm),  it  follows  that  if  i  is  in  the  cyclotomic  coset 
Cs,    then 


M( i) (x)    = 


jeC, 


(x+3J) 
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TABLE  III 
CYCLOTOMIC  COSETS 

OVER  GF(23)  OVER  GF(25) 

C0  =  (0}  C0   =  {0} 

Ci  =  {1,2,4}  Ci   =  {1,2,4,8,16} 

C3  =  {3,6,5}  C3   =  {3,6,12,24,17} 

C5   =  {5,10,20,9,18} 

C7   =  {7,14,28,25,19} 

Cii  =  {11,22,13,26,21} 

C15  =  {15,30,29,27,23} 

OVER  GF(26) 

C0  =  {0} 

Ci  =  {1,2,4,8,16,32} 

C3  =  {3,6,12,24,48,33} 

C5  =  {5,10,20,40,17,34} 

C7  =  {7,14,28,56,49,35} 

C9  =  {9,18,36} 

Cn  =  {11,22,44,25,50,37} 

C13  =  {13,26,52,41,19,38} 

C15  =  {15,30,60,57,51,39} 

C2i  =  {21,42} 

C23  =  {23,46,29,58,53,43} 

C27  =  {27,54,45} 

C31  =  {31,62,61,59,55,47} 
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which  is  analogous  to  the  generator  polynomial  g(x)  in  our 
generic  architecture  of  the  previous  chapter.  Moreover,  by 
utilizing  various  techniques  beyond  the  scope  of  this 
thesis,  we  may  determine  all  the  minimal  polynomials  of 
elements  in  GF(24),  as  depicted  in  Table  IV.  Using  this 
table  we  may  construct  all  the  Reed- Solomon  codes  of  block 
length  15  which  correct  t  or  fewer  channel  errors.  These 
codes  have  the  following  generator  polynomials: 

t=l    g(x)=M(D  (x)=l+x+x4 

t=2    g(x)=M(D  (x)*M(3  )  (x )  =l+x4  +  x6+x7  +  x8 

t=3         g(x)=MU)  (x)*M(3  )  (x)*M(5  )=i+x+x2+x4  +  x5  +  x3+x10 

Hence,  the  t-error  correcting  RS  code  of  block  length  n  is 
then  the  cyclic  code  whose  generator  polynomial  is  the 
product  of  the  distinct  minimal  polynomials  of  g,  g2  f  g3  f 
...,  p2t-l/  g2t  [Ref.  5].  Of  noteworthy  interest  is  the 
fact  that  an  RS  code  over  GF(2^)  which  is  designed  to 
correct  up  to  4  errors  is  also  able  to  correct  5  errors. 
This  is  because  M(9)(x),  the  minimal  polynomial  of  $  ,  is 
identical  to  M(5)(x),  the  minimal  polynomial  of  &  .  Simi- 
larly, the  6  error-correcting  RS  code  is  identical  to  the  7 
error-correcting  code  just  as  the  8-to-14  error  correcting 
codes  of  length  31  are  all  identical  to  the  15  error- 
correcting  code.  In  a  similar  way,  codes  over  GF(2^)  and 
GF(27)  are  sometimes  able  to  correct  more  errors  than  they 
are   designed    to    correct.      The    ability    to    correct    these    extra 
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TABLE  IV 
MINIMAL  POLYNOMIALS  OF  ELEMENTS  IN  GF(24) 

M(D(x)  =  M(2)(X)  =  M(4)(X)  =  M(8)(X)  =  1+x+x4 

M(3)(X)  =  M(6)(X)  =  M(12)(X)  =M(9)(x)  =  l+x+x2+x3+x4 

M(5)(X)  =  M(10)(X)  =  1+x+x2 

M(7)(X)  =  M(14)(X)  =  M(13)(X)  =  m(H)(x)  =  l+x3  +  x4 


45 


error  patterns  depends  upon  finding  higher  powers  of  3  which 
belong  to  cyclotomic  cosets  for  the  smaller  powers  of  3 
which  belong  to  the  code  for  the  designed  error  correcting 
distance.  The  tables  of  cyclotomic  cosets  for  GF(25), 
GF(26)  show  that  39  belongs  to  35  ,  317  belongs  to  3^  and  319 
belongs  to  3^3,  etc.  See  [Ref.  11]  for  further  discussion 
of  the  error  correcting  capabilities  of  given  error 
correcting  codes. 

E.   SYSTOLIC  ARRAY  MULTIPLIER 

As  mentioned  earlier  in  this  chapter,  the  systolic  array 
multiplier  used  in  the  generation  of  Reed- Solomon  codewords 
is  much  more  complex  than  in  the  binary  case.  In  this 
section,  we  discuss  the  design  of  a  systolic  array  multi- 
plier developed  by  Yeh,  Reed,  and  Truong  [Ref.  7]  to  assist 
us  in  our  implementation  of  an  RS  encoder. 

According  to  [Ref.  7]  several  circuits  have  been  pro- 
posed to  realize  multiplication  in  GF(2m).  Unfortunately, 
these  circuits  are  not  suited  for  use  in  VLSI  systems,  due 
to  irregular  wire  routing,  complicated  control  problems, 
nonmodular  structure  and  lack  of  concurrency.  The  systolic 
array  multiplier  of  [Ref.  7]  performs  the  multiplication  in 
the  field  GF(2m)  which  overcomes  some  of  these  unwanted 
attributes. 

The  systolic  architecture  is  developed  for  performing 
the   product- sum   computation,   AB+C,  in   the   finite   field 
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GF(2m)  of  2m  elements,  where  A,  B,  and  C  are  arbitrary 
elements  of  the  field.  The  multiplier  is  a  serial- in, 
serial-out,  one-dimensional  systolic  array  which  requires  m 
basic  cells.  To  perform  an  isolated  computation  the  multi- 
plier requires  3m  time  units,  however,  the  average  time  per 
computation  is  only  m  time  units  if  a  number  of  computations 
are  carried  out  consecutively.  Because  the  architecture  is 
simple  and  regular  and  possesses  the  desirable  properties  of 
concurrency  and  modularity,  it  is  well  suited  for  VLSI 
implementation.       [Ref.    7] 

Consider  the  nonzero  elements  of  GF(2m).  They  can  be 
represented  as  the  powers  of  3,  a  primitive  element  of 
the  field  as  discussed  in  Chapter  III.  Since  F(  3)=0, 
3m=fm_l  3^"^-+ . .  .  +  f]_  3+fg  ,  where  the  coefficients  fi  are 
determined  by  the  polynomial  f(x)  which  3  satisfies. 
Therefore  an  element  of  GF(2m)  is  of  the  form 
am-l  3m_1+.  .  .  +  ai  3+3q  where  ai  e  GF(2)  for  0  <  i  <  m-1.  In 
the  following  discussion,  the  polynomial  representation  is 
used    to    represent    the    finite    field    GF(2m). 

Let  A=am_]_  $™~1+  . .  .  +  a]_  3+a0  and  B=bm_1  3m"1+ .  .  ,  +  b!  3+b0  be 
two  elements  in  GF(2m).  Then  A+B=Sm-1  3^-1+ . .  ,  +  S±  3+S0  , 
where  Si=ai+bi  (mod  2)  for  0  <  i  <  m-1.  Therefore  addition 
in  GF(24)  is  realized  easily  by  m  independent  Exclusive-OR 
gates . 

Suppose  P=pm_i  3m~1+.  .  .  +  Pi  3+Pq  is  the  product  of  A  and  B, 
i.e.,    P=AB.      Then    P  can   be   written  as    follows: 
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m-1  m-1     m-1 

-  zl^x-  X  <2Zan(kV)b* 

k=0  k=0     n=0 

(1) 

m-1    m-1 


c>   *J*\>e 


n=0     n=0 


where  an(k)  is  the  coefficient  of  3n  in  A  3^,  i.e.,  A  3k  = 
am_i (k) 3m_1+. • .+ai(k) 3+ao^k^  for  0  <  k  <  m-1.  From  equation 
(1)  we  obtain  pn=an( ° )b0+an( 1 )bj  + . . .+an(m~2 )bm_2+an(m"1 ^m-l . 
The  computation  of  A  3k  can  be  performed  recursively  on  k 
for  0  <  k  <  m-1.  Initially  for  k=0,  A3^=Af  i.e.,  an(0)=an 
for  0  <  n  <  m-1.   For  1  <  k  <  m-1, 


m-1 


*  „k    /7V  k-1.     >     (k-1)  n+1 
A3   =  (A3    )  3  =  y/      anv     3 

n=0 


m-1 

(k-1)  m   \        (k-1)  n 
=  a   ,  v     3+/    a,v    '3 
m-1      M    ^ n-1 

n=l 


(2) 


Substituting  3m=fm-i 3m_1+ • . .+fi 3+f 0   into   equation  (2), 
yields 


m-1 

A3*=V(a   (k-D+a   (k-D  f)6n  +  a   (k-1) 

/_ n-1      m-1      n       m-1      u    v 

n=l 


48 


From   equation    (3),    we   obtain 

an(k)    =   an-l(k"1)   +    am-l(k~1)fn        for    1    <  n    <  m-1 

(4) 
ao<k>    =   a^i^-DfQ 

Table  V  indicates  the  step- by- step  procedure  for  comput- 
ing P=AB+C  in  GF(24).  In  Table  V  an(k),  bn,  cn,  fn,  and 
pn  are  the  n- th  bits  of  A£k,  B,  C,  F,  and  P,  respectively, 
where  F  is  the  primitive  polynomial  and  Pn^1^  is  tne  partial 
sum  of   pn . 

Figure  5.3  depicts  the  systolic  multiplier  for  our  given 
finite  field.  The  primitive  polynomial  is  F=f3  g^+f^  &^+f±  3* 
f 0  .  Input  dn  receives  the  bit  bn  of  B.  The  n- th  bits  cn, 
an  and  fn,  of  C,  A,  and  F,  respectively,  are  received 
serially  at  inputs  en ,  gg  ,  and  hg .  Two  control  signals, 
START  (0001)  and  END  (0111)  are  used  in  the  design  with 
inputs   rg    and    tg    receiving    the    signals,    respectively. 

Output  e4  serially  transmits  the  n- th  bit,  pn,  of  the 
result  P  out  of  the  system.  The  order  of  the  inputs  and  the 
outputs  is  also  shown  in  Figure  5.3.  The  flip-flops  (FF) 
associated  with  inputs  tg  and  hg  are  used  for  the  purpose  of 
synchronization . 

The  circuit  of  cell  Lj^  is  shown  in  Figure  5.4.  The 
operation  of  the  flip-flops  in  the  system  is  synchronized 
implicitly  by  a  clock  signal.  When  ri*=i,  ui=gi*  at  the 
next      time    unit      (through    switch      SW) .        Additionally,      when 
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TABLE  V 
COMPUTATION  OF  P  =  AB  +  C  IN  GF ( 24 ) 


STEP 
NUMBER 


OPERATIONS 


1             p3(0  )    =   c3                                               a3(0  )    =   a3 

p3(D    =   p3(0  )+a3(0  )b0  ,                  a2(°)    =   a2 
2 

p2(0)    =   c? 

p2(l)    =   p2(°  )  +  a2(°  )b0  ,                  ax(°)    =   ax 
3 

Pl  (0)    =   c-,                                                a^d)    =   a9(0)+a^(0)f^ 

p3(2)    =   p3 (° )+a3 (! )b^ ,                  ag (°  )    =   ag 
4             Pl(D    =   Pi(°)+a1(0)b0,                  a2U)    =   a1(0)  +  a3(0)f2 
PO  (0  )    =   cn 

p2(2)    =   p2(l)+a2(l)bl/                  ai(l)    =   a0(0)+a3(0)f1 
5 

pn(D    =   p0  (0  )+an  (0  )bn  ,                  a3(2)    =   a2(1)+a3U)f3 

p3(3)    =    p3(2)  +  a3(2)b2f                  a0(D    =   a3(0)f0 
6 

Pl(2)    =    pt  (D  +  ai  (!)bi  ,                  a?  (2)    =   a-|  U  )+ar*  (1  )f2 

p2(3)    =   p2(2)  +  a2(2)b2/                  ai(2)    =   ao(l)+a3(l)f1 

7 

pn(2)    =    Po  (l)  +  af)  (lJb-L  f                  a3(3)    =   a?  (2  )  +  a3  (2  )  f^ 

p3    =   p3(4)    =   p3  (3  )  +  a3  (3  )b3  ,      a0(2)    =   a3(Df0 
8 

Pt(3)    =    Pl  (2  )  +  ai  (2  )b2  ,                   a2(3)    =   a^  (  2  )  +  a3  (  2  )  f  2 

p2    =    p2(4)    =    p2(3  )+a2(3  )b3  ,      ax(3)    =   a0  (2  )+a3  (2  )  fx 

9 

p0  (3  )    =    Po  (2  )+an  (2  )b2/ 

10 


Pl  =  px(4)  =  p1(3)  +  a1(3)b3,   a0(3)  =  a3(2)f( 


11 


PO  =  P0(4)  =  Po(3>+a0(3)b3, 
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ri*=0 ,  Ui  retains  its  value.  Two  principle  operations  of 
the    system   are    the    following : 

ei+1      < (9i*di)  ©  ei* 

9i+l*   < (uihi*)©  (gi*ti*) 

where  0  <  i  <  3,  ©denotes  Exclusive-OR  operation,  i.e., 
modulo-2  addition,  and  the  backwards  arrow  denotes  the  sub- 
stitution  operation. 

A  comparison  of  the  procedure  in  Table  V  and  the 
structure  in  Figures  5.3  and  5.4  yields  the  following  facts: 
The  signal  u^  in  L^  is  equal  to  a3 ( 1 )  in  Agi.  The  signal 
gi  is  equal  to  an(1)  in  A31  for  some  n.  The  signal  e^*  is 
equal    to    the    partial    sum   AB+C. 

The  multiplier  in  Figure  5.3  can  be  generalized  to  the 
finite  field  GF(2m)  by  simply  concatenating  m  identical 
cells.  Furthermore,  additional  registers  and  control  sig- 
nals would  be  required  if  the  b^'s  are  fed  serially  into  the 
system    in    the    same   manner    as    the   a^'s.       [Ref.    7] 
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VI.       IMPLEMENTATION 

A.       BINARY    ENCODER 

In  this  section  we  discuss  the  encoding  process  for  a 
binary  code  and  utilize  a  primitive  shift  register  design  to 
implement  both  a  single-error-correcting  binary  encoder  and 
a  double-error-correcting   binary   encoder. 

1 .      Encoding    Process 

As  discussed  in  Chapter  V,  an  (n,k)  code  can  be 
generated  with  a  polynomial  of  degree  n-k.  If  the  poly- 
nomial is  primitive  of  degree  r  and  n=2r-l,  the  code  can  be 
encoded  and  decoded  with  primitive  shift  registers.  Hence, 
we  restrict  our  attention  solely  to  the  case  of  primitive 
polynomials . 

We  illustrate  this  procedure  by  generating  the 
(15,11)  binary  code  using  the  primitive  polynomial  p(x)= 
1+x+x4.  Here  n-k=4 ,  r-4 ,  n=24-l=15 ,  and  k=ll.  The  encoding 
process  for  the  11-bit  message  10101010101  proceeds  as  in 
the    example    below. 

Example   of    Encoding    Process: 
Message    =    10101010101 

1)  Represent    the    message        m(x)=l+x2+x4+x^+x3+xl0 
as    a    polynomial . 

2)  Multiply   m(x)    by    xn"k        x4m( x ) =x4+x6+x8+x10+x12+x14 
to    shift    the   message 

digits    to    the    far    right. 
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3)  Calculate    the   remainder  r(x)=l+x+x3 
when   xn~^m(x)    is 

divided    by   p(x)  . 

4)  Form    the   code  c (x)=l+x+x3+x4+x6+x8 
polynomial    as    the    sum  +xl0+x12+x14 
xn~^m(x)+r (x) ,    a 

multiple   of   p(x). 

Code  Word  =  110110101010101 
Note  that  codewords  in  this  code  are  formed  as  multiples  of 
the  primitive  generating  polynomial  p(x).  As  p(x)  is  of 
degree  r  there  are  n-r=k  information  symbols  which  can  be 
chosen  freely  and  then  r  check  symbols  are  chosen  so  that 
the  resulting  codeword  satisfies  this  criteria,  namely  that 
the  codewords  are  multiples  of  the  generator  polynomial.  In 
other  words,  the  check  digits  are  the  coefficients  of  the 
remainder  r(x)  upon  division  of  xn"^m(x)  by  p(x)  as  shown 
below. 

xlQ  +  x8+x7  +  x5  +  x4  +  x3  +  l 

4^  _li  I  14^  12^  1(1   8^  6^  4 
X  +X+1  |x   +x   +x   +x  +  x  +x 

14^  11   10 
X   +x   +x 


12^  11   8 
X   +x   +x 

12^  9  A  8 
X   +x   +  x 

H^.  ^^   6 
X   +x  +x 


9..  8^  7 _,_  6^  4 
X  +x  +x  +x  +  x 

x9  +     +  x6  +  x5 

x8  +  x7+x5  +  x4 

8    .5,4 
X +x  +  x 

x7 

x7  +  x4+x3 


x4+x3 

4 
X  +X+1 


X  +  X+1 


55 


2 .  Single- Error- Correcting    Binary    Encoder 

By  utilizing  the  previously  discussed  concepts,  we 
may  now  describe  the  encoding  process  of  the  binary  (15/11) 
code  as  implemented  in  a  primitive  shift  register  shown  in 
Figure  6.1.  By  simply  feeding  in  the  message  m(x)  at  the 
x^- stage  we  are  able  to  simulate  the  effect  of  multiplying 
m(x)  by  x^ .  The  switch  remains  in  position  1  as  m(x)  is  fed 
completely  into  the  shift  register.  The  shift  register 
computes  the  remainder  when  x^m(x)  is  divided  by  p(x)  as  the 
shift  register  is  in  essence  a  division  circuit.  The 
register  contents  after  the  information  bits  have  all  been 
fed  into  the  register  is  the  remainder  after  division  of  the 
information  polynomial  by  the  generator  polynomial  p(x).  In 
the  example  the  remainder  is  1101=l+x+x3.  The  switch  is 
then  changed  to  position  2  to  allow  the  check  digits  to 
follow  the  message  digits  producing  the  coded  output 
110110101010101    for    the    example   given.       [Ref.    10] 

3 .  Double- Error- Correcting    Binary    Encoder 

To  design  a  double-error-correcting  binary  encoder 
to  correct  up  to  two  errors,  additional  redundancy  must  be 
added.  Since  we  are  now  concerned  with  correction  of  up  to 
two  errors  the  generator  polynomial  is  the  product  of  the 
two  distinct  minimal  polynomials  m(1)(x)  and  Mw)(x)  as 
described  in  the  previous  chapter.  Their  product  is  the 
polynomial  l+x4  +  x6  +  x"7  +  x8 .  The  implementation  of  the  encoder 
is   carried      out      in      essentially      the      same      manner      as      its 
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Figure  6.1   A  Single-Error-Correcting  Binary  Encoder 


57 


single-error  counterpart.  The  encoder  is  presented  in 
Figure  6.2.  Now  n=15  and  k=15-8=7  so  that  there  are  a 
smaller  number  of  codewords  (2^)  in  this  more  powerful  code. 
As  the  error  correcting  capability  of  the  code  increases, 
the  number  of  information  bits  correspondingly  decreases. 

B.   REED-SOLOMON  ENCODER 

In  this  section  we  draw  upon  the  work  of  Liu  [Ref.  8] 
and  our  acquired  knowledge  of  finite  field  theory  and  Reed- 
Solomon  codes  to  produce  an  RS  encoder. 

As  discussed  in  Chapter  IV,  an  RS  codeword  has  (2J-1) 
symbols  each  of  which  is  J-bits  wide.  Of  the  (2J-1)  symbols 
there  are  (2^-l-2E)  information  symbols  and  2E  parity-check 
symbols,  where  E  is  the  number  of  symbol-errors  the  RS  code 
is  able  to  correct.  If  we  treat  the  (2J-1-2E)  information 
symbols  as  the  coefficients  of  the  polynomial 

2J-1-2E 

a     s  \    c      2J-l-i   c   2J-2^  .   2J-3^    .-         2E 

f  X   =  /  f.x       =  f,x     +  f0x     +...+f  ,      X 

C^       L  1  2  2J-1-2E 

1  =  1 

then  the  2E  parity-check  symbols  can  be  obtained  as  the 
coefficients  of  the  remainder  of  f(x)/g(x)  where  g(x)  is  the 
generator  polynomial  of  the  code.  Usually,  g(x)  is  defined 
as 

2E  2E 


g(x)   = 


(x+e1)    =     >        g^3 
i=l  j=0 
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where  3  is  a  primitive  element   of  the  Galois   field  GF ( 2J ) 
and   g.'s  are  the  coefficients  of  g(x)  with  g2  =1. 

A  diagram  of  the  RS  encoder  which  generates  the 
remainder  of  f(x)/g(x)  is  given  in  Figure  6.3.  It  is 
composed  of  2E  systolic  array  multipliers,  2E  "exclusive-or" 
adders,  and  2E  shift  registers.  The  coefficients  of  the 
generator  polynomial  g(x)  are  fed  into  their  respective 
systolic  multipliers  where  the  finite  field  multiplication 
A*B  occurs,  as  discussed  in  Chapter  V.  Upon  completion  the 
partial  product  is  "exclusive-or ' ed"  with  the  contents  of  C 
of  the  previous  shift  register  and  distributed  down  the  line 
to  the  next  shift  register  in  a  pipeline  fashion.  The 
switches  are  normally  in  the  "ON"  position  until  the  last 
information  symbol  goes  into  the  encoder.  At  this  moment 
all  the  switches  are  turned  to  the  "OFF"  position  and  the 
encoder  behaves  like  a  long  shift  register.  The  output  of 
the  encoder  is  then  taken  from  the  output  of  the  last  shift 
register.   [Ref.  8] 

C.   BINARY  DECODER 

In  this  section  we  discuss  the  decoding  process  and 
design  a  single-error-correcting  binary  decoder  and  a 
double-error-correcting  binary  decoder  both  of  which  can  be 
used  in  conjunction  with  the  binary  encoders  of  Section  A  of 
this  chapter. 
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1 .  Decoding  Process 

The  decoding  process  is,  in  general,  much  more 
complicated  than  the  encoding  process.  Not  only  must  we 
deal  with  the  detection  of  errors  but  also  with  their 
correction.  As  a  result,  we  must  be  able  to  design  a 
decoder  which  simultaneously  detects  and  corrects  errors. 

Error  detection  is  usually  much  easier  than  error 
correction.  Recall  that  a  code  polynomial  is  a  multiple  of 
the  generating  polynomial  p(x).  In  other  words,  the 
received  polynomial  u(x)  will  be  a  code  polynomial  if  and 
only  if  the  remainder  upon  division  of  u(x)  by  p(x)  is  zero, 
i.e.,  u(x)  =  0  modulo  p(x).  An  example  is  given  in 
Table  VI.  The  register  contents  after  u(x)  is  fed  com- 
pletely into  the  detecting  division  register  will  contain 
u(x)  modulo  p(x).  If  any  of  the  register  contents  are 
nonzero,  u(x)  is  not  a  valid  codeword.  Thus  the  shift 
register  acts  as  an  error  detector  by  performing  a  division 
of  u(x)  by  p(x).  In  fact,  the  nonzero  contents  not  only 
indicate  that  an  error  has  occurred  in  transmission,  but 
those  contents  also  indicate  the  error  pattern  needed  to 
correct  the  error  and  the  location  of  the  error  in  the 
transmitted  codeword.   [Ref.  10] 

2 .  Single-Error-Correcting  Binary  Decoder 

Because  of  the  complexity  of  the  decoding  process, 
we  will  initially  design  an  error  detection  register 
followed   by   its   error   correction   counterpart   and   then 
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TABLE  VI 
VERIFICATION  OF  THE  CODE  POLYNOMIAL 


HT     8    _,_    7    A    5^    4^    3^, 
X      +x      +  x      +x   +x   +x   +1 

x4+x+l|x14+x12+x10+x8+x6+x4+x3+x+l 

14^    lr     8 
X      +x      +  x 


12^    11      8 
x     +x     +x 

12^   9    _,_    8 
X      +  x      +  x 

11^    9^    6 
X      +x    +x 

11^    8^    7 
X      +x   +x 


9^    8^    7^    6^    4 
X    +x   +x   +x    +x 

9  6,    5 

X X    +x 

x8+x7+x5+x4+x3 

8         _,_    5^    4 
X +x    +  x 

X    +x   +x 

7^    4L    3 
X    +x   +x 


4 
x  +x+l 

x   +x+l 
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synthesize  them  together  to  implement  the  complete  decoder. 
To  begin,  we  utilize  the  error  detection  register  of  Figure 
6.4.  It  is  identical  to  the  encoding  register  of  Figure  6.1 
except  that  the  received  codeword  is  input  to  the  decoder  at 
the  left  end  of  the  register.  If  the  received  word  is 
110111101010101,  then  the  nonzero  contents  0110  after 
division  indicate  that  an  error  has  occurred  in  trans- 
mission. In  order  to  correct  the  received  word  we  need  to 
know  the  error  position. 

The  received  word  can  be  viewed  as  a  polynomial  u(x) 
which  can  be  written  as  the  sum  of  the  code  polynomial  c(x) 
and  an  error  polynomial  e(x),  namely  u(x)  =  c(x)  +  e(x). 
The  error  polynomial  e(x)  has  ones  in  its  error  positions 
and  zeros  elsewhere,  and  addition  is  term  by  term  modulo  2. 

Since  the  codewords  c(x)  are  generated  as  multiples 
of  the  generator  polynomial  g(x)  and  since  3  is  a  root  of 
g(x),  the  code  polynomials  evaluated  at  B  are  equal  to  zero, 
namely  c(g)  =  0.  Thus  u(B)  =  c( 3)  +  e(B)  =  e(g).  Since  we 
assume  in  this  sub-section  that  only  single  errors  have 
occurred  in  transmission  we  can  also  assume  that  if  an  error 
occurs  then  e(x)  is  a  power  of  x,  say  e(x)  =  xi  for  some  i. 
Thus  u( B)  =  e( B)  =  $i . 

In  order  to  correct  the  error  we  need  to  compute 
u(B)  which  is  called  the  syndrome  of  the  received  word  and 
then  find  the  specific  value  i  for  which  u(B)  =  B1*  The 
value  i  will  indicate  the  error  position.   We  need  then  only 
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set  c(x)  =  u(x)  +  x1  to  obtain  the  code  polynomial  c(x)  a 
multiple  of  p(x)  which  is  "nearest"  to  the  received  poly- 
nomial u(x).  The  primitive  shift  register  facilitates  this 
task  because  while  it  is  computing  u(x)  modulo  p(x)  it  also 
leaves  the  coefficients  of  u(e)  =  B1  in  the  shift  register. 
[Ref.  10] 

For  example,  in  Figure  6.5  the  primitive  shift  reg- 
ister computes  u(x  )=1+x+x3+x4+x5+x6+x8+x10+x12+x14  modulo 
p(x)=l+x+x4  and  the  syndrome  is  0110  =  x+x2  .  Note  from 
Table  I  of  Chapter  III  that  0110  is  the  4-digit  represen- 
tation of  3^.  Hence  the  error  in  the  received  polynomial 
occurs  in  the  position  of  x^ .  Therefore,  the  code  poly- 
nomial is  c(x)=  u(x)+e(x)=(l+x+x3+x4+x5+x6+x8+x10+x12+x14 ) 
+x5  =  l+x+x3+x4+x6+x8+x10+x12+x14 .  The  corrected  codeword 
is  110110101010101  and  the  corrected  information  symbols  are 
10101010101.  The  same  procedure  is  also  illustrated  in 
Table  VII  by  the  actual  long  division  process. 

We  now  examine  the  primitive  shift  register  decoding 
process  which  performs  the  error  correction.  After  the 
syndrome  is  computed  by  the  primitive  shift  register 
division  process,  an  additional  primitive  shift  register  of 
the  same  type  can  be  used  to  correct  the  error  without 
reference  to  a  table  of  powers  of  the  primitive  element  0. 
The  correcting  register  shown  in  Figure  6.6  is  basically  the 
same  primitive  shift  register  used  throughout  this  chapter 
with  the  exception  that  there  are  output  lines   leading  from 
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TABLE  VII 
SYNDROME  CALCULATION  USING  LONG  DIVISION 


10   8    7    5   4   3 

X   +X   +X   +  X  +  X  +X  +  X+1 

x^x+l|x14  +  x12+x10  +  x8+x6  +  x5  +  x4  +  x3  +  x+l 
14^    11      10 

X        4-X        +X 

12^    11      8 
X      +x      +x 

12^    9^8 
X      +  x      +x 


11^    9^    6 
X      +x   +  x 

11      8^    7 
X      +  x   +x 


9       8      7      6      5 
x   +x   +x   +x   +x 

9  A    6^   5 

X +  x   +  x 

8^    1  ^    4 
X   +x   +x 

8^   5^    4 
X    +x    +  x 

x7  +  x5  +  x3 

x7+x4+x3 


5,    4^ 
X    +x   +x 

X    +  x   +  x 

4^    2^, 
X    +x    +1 

X    +x    +1 


2^ 
X    +x 


SYNDROME:       x+x2    =0110    =    35 

1       3       B2    33     34     B5     36     37     B8     39     31Q  311  312  B13  314 

110111101010101         =    ERROR 

CODEWORD 

110110^101010101  =    CORRECTED 

CODEWORD 
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each  of  the  four  registers.  If  the  correcting  register 
is  set  initially  at  0100,  the  4-digit  representation  for 
the  element  3,  then,  as  it  shifts,  the  output  is  the  same 
cycle  as  the  4-digit  representation  listing  of  the 
$i ( i=l , 2 , 3 ,  .  .  .  ,15  )  in  Table  I  since  a  shift  in  the  primitive 
shift  register  is  the  same  as  multiplication  by  3.  No 
matter  which  state  the  register  is  set  to  initially  the 
correcting  register  will  output  elements  of  that  maximum- 
cycle  in  the  same  cycle  order  as  long  as  the  register 
continues  to  shift.  If  the  register  is  set  at  31/  it  will 
be  in  state  Bi+J  after  j  shifts.   [Ref.  10] 

Figure  6.7  (the  complete  single-error-correcting 
binary  decoder)  shows  the  received  word  of  our  example, 
namely  110111101010101  whose  polynomial  form  is  l+x+x^+x^+x^ 
+x6+x8+x!0+xl2+x14  in  a  storage  register  and  the  syndrome 
0110  in  the  correcting  register.  From  our  previous  dis- 
cussion we  know  that  the  error  occurs  in  U5 .  Thus,  if  the 
detector  register  has  output  1  as  U5  leaves  the  storage 
register  and  0  otherwise,  the  word  110111101010101  will  be 
corrected  after  fifteen  shifts  to  read  110110101010101.  We 
illustrate  how  the  correcting  register  is  used  to  accomplish 
this  task  by  listing  the  new  states  of  the  correcting 
register,  and  the  outputs  from  the  storage  register  and 
decoder  after  each  shift.  The  states  are  depicted  in 
Table  VIII. 
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Note  from  Table  VIII  that  the  incorrect  digit  115 
leaves  the  storage  register  when  the  correcting  register  is 
in  state  1000.  If  the  detector  is  made  to  produce  an  output 
1  when  it  detects  1000  and  0  otherwise,  then  U5  will  be 
properly  corrected.  In  general,  if  the  syndrome  is  31,  then 
the  error  occurs  in  the  coefficient  of  x1,  namely  u^,  where 
the  received  polynomial  has  the  form 

n-1 


u(x)  =  ?      Uix* 


i=0 

If  j  is  such  that  un_j  =  ui,  then  ui  leaves  the  storage 
register  when  the  correcting  register  is  in  state  31+J  as 
shown  below: 


State  Output 

3i+1  un_! 

3i+2  un-2 


1+j  Un_j=Ui 


Since  3i+j=3n=l,  the  detector  will  correct  the  digit  Uj[  and 
the  received  word  will  be  corrected  to  the  nearest  code  word 
after  the  decoder  completes  this  process.   [Ref.  10] 
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To  recapitulate  the  correction  process,  the  detect- 
ing register  computes  the  syndrome  of  the  received  word.  As 
each  digit  of  u(x)  enters  the  detecting  register  it  simul- 
taneously enters  the  storage  register.  When  the  syndrome  is 
determined,  it  is  transferred  to  the  correcting  register  for 
the  error-correcting  procedure  just  described. 
3 .   Double-Error-Correcting  Binary  Decoder 

To  implement  a  double-error-correcting  binary 
decoder  we  begin  with  a  general  analysis  of  the  three  stages 
that  comprise  decoding.  The  first  stage  is  the  Syndrome 
Generator  stage.  The  syndrome  is  defined  as  the  nonzero 
remainder  of  the  received  polynomial  when  it  is  divided  by 
the  given  primitive  shift  register.  The  second  stage  or  the 
Central  Galois  Field  Processor  finds  the  error  locator 
polynomial  a(z)  (usually  accomplished  by  using  Berlekamp's 
iterative  algorithm  or  Massey's  linear  feedback  shift 
register  synthesis  algorithm) .  At  this  stage  the  polynomial 
is  determined  which  defines  the  location  of  the  errors  that 
have  occurred  in  transmission.  Finally,  the  third  stage  or 
the  Chien  Searcher  stage  finds  the  roots  of  a(z)  to  deter- 
mine which  digits  should  be  corrected.  Note,  in  the  binary 
code,  correction  is  trivial  when  the  location  of  the  errors 
is  determined,  i.e.,  the  bit  in  error  need  only  be 
complemented.   [Ref.  11] 

Using  our  previous  double-error-correcting  generator 
polynomial    l+x^+x^+x^+x^ ,    which   is    the    product    of 
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(1+x+x4 ) (l+x+x2+x3+x4 ) ,  we  are  able  to  produce  Stage  I  of 
the  decoding  process  as  illustrated  by  the  division  process 
in  Figure  6.8.  Similarly,  we  are  also  able  to  produce 
Stages  II  and  III  (Figures  6.9  and  6.10,  respectively)  along 
with  a  block  diagram  of  the  complete  decoder  in  Figure 
6.11. 

The  operation  of  the  decoder  is  relatively  straight 
forward  as  in  the  previous  section.  Utilizing  a  buffer 
capable  of  storing  2n  digits,  the  Chien  Searcher  is  in  the 
process  of  computing  a(z)  in  order  to  determine  whether  or 
not  the  next  digit  to  leave  the  buffer  should  be  corrected. 
The  Syndrome  Generator  at  the  same  time  computes  the 
syndrome  of  the  received  word  while  the  Central  Galois  Field 
Processor  finds  the  error- locator  polynomial  for  the 
buffered  word.  Once  the  coefficients  of  the  error-locator 
polynomial  are  read  out  of  the  Central  Galois  Field 
Processor  and  into  the  Chien  Searcher,  the  syndrome  or  the 
nonzero  remainder  of  the  next  block  of  received  words  is 
read  back  into  the  Central  Galois  Field  Processor  for  con- 
tinual operation.  See  [Ref.  5]  for  further  details  of  the 
multiple  error  correction  process. 

If  the  Central  Galois  Field  Processor  operates  so 
fast  that  it  is  able  to  compute  the  error  location  before 
all  of  the  new  received  word  arrives,  then  the  buffer  size 
may  be  reduced.  In  general,  the  buffer  is  made  big  enough 
to  accommodate   the   expected  worst   case  for   the   time   to 
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Figure  6.8   Stage  I:   The  Syndrome  Genera 


t6r 


76 


/N 


<£> 


& 


7F 


<2> 


vi/ 


CO 


T" 


(N 


CO 


± 


<£* 


"7TT 


\/ 


(^ 


co 


^v 


» 


« 


+ 
5 


w 


CO 


M 
O 

en 

CO 
CD 

o 
O 

u 

H 

cd 

•H 

En 

CO 
•H 

o 

H 

fd 
O 


+J 

c 

CD 

u 
CD 

X! 
Eh 


H 


Cn 
(d 
+J 

CO 


CD 
U 

Cn 

•H 

fa 


77 


Figure  6.10   Stage  III:   The  Chien  Searcher 
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compute  the  locations  of  the  two  errors.  However,  for 
example,  suppose  that  the  Central  Galois  Field  Processor  is 
able  to  compute  the  error  locator  in  half  the  time  required 
for  n  digits  to  be  received  from  the  channel.  In  that  case, 
the  buffer  need  only  be  capable  of  storing  3n/2  digits. 
After  a  complete  word  is  received,  the  central  processor 
computes  its  error  location  by  the  time  the  beginning  of 
this  word  is  ready  to  leave  the  buffer.  The  error  locator 
is  then  fed  into  the  Chien  Searcher,  and  the  central  pro- 
cessor sits  idle  until  the  rest  of  the  incoming  word  is 
received.   See  [Ref.  5]  for  details. 

Although  the  above  discussion  pertains  strictly  to  a 
binary  decoder  capable  of  correcting  two  errors,  it  can  be 
generalized  to  correct  t  or  fewer  errors.  By  expanding  the 
hardware  in  Stages  I  and  III  to  accommodate  the  additional 
shift  register  size  required  by  t  distinct  minimal  poly- 
nomials, we  are  able  to  implement  the  decoder  with  approxi- 
mately the  same  effectiveness.  Likewise,  the  same  procedure 
of  utilizing  the  product  of  t  distinct  minimal  polynomials 
would  also  be  used  in  the  design  of  a  multiple- 
error-correction  binary  encoder. 

D.   REED-SOLOMON  DECODER 

As  with  any  multiple-error  detection  and  correction 
process,  the  decoding  of  RS  codes  is  very  complex.  As  a 
result,  the   known  decoding   procedures  as  discussed   by  Liu 
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[Ref.  12]  will  be  presented  in  this  section  to  obtain  a  re- 
petitive and  recursive  technique  which  is  suitable  for  sys- 
tolic array  development  and  eventual  VLSI  implementation. 

Recall  that  the  information  symbols  of  an  RS  code  are 
treated  as  the  coefficients  of  the  polynomial  f(x).  If  we 
let 

f(x)  =  fn+fix+. . .+fN_ixN_1 

be  the  transmitted  code  vector  (where  N  =   codeword  length), 
and  let 

r(x)  =  ro+rix+. . .+r^_ixN_1 

be  the  received  code  vector   over  a  noisy  channel,   then  the 
error  pattern  added  by  the  channel  is 

e(x)  =  r(x)-f(x)  =  eo+eix+. .  .+eN-ixN"*1 . 

The  first  step  of  the  decoding  procedure  is  to  store  the 
received  code  vector  rj  into  the  buffer  register  and  then 
compute  the  syndrome  S^  using  the  equation 


N-l 

c      .  k+1,    >      (k+i)j 
Si  =  r(  0    )  =  /"   r .  3     J 


N-l. 

(5) 


j=0 


where   0  <   i  <  2E-1 .    Since  rj  =  fj+e-j,   equation  (5)  can  be 
expressed  as 


N-l 

:i=y<fj+e. 


S<    =    >      (fJ+ej)g(k+i»3 


N-l  N-l 


=  Xf:e(k+i)J  +2Ze3g(k+i)J 


j=0  D=0 

=    Fk+i   +    Ek+i  (6) 

In    the   above   equation 


N-l 

***•  =y"f.3(k+i)j 

k+  i       Z 3 

j=0 


(7 


and 

N-l 


.-I 


\+i=zLV(k+1)]  (3) 

j-0 


Note  that  in  equation  (8)  Eyi+ ±    is  the  finite  field  transform 
of  the  ej ' s. 

The  second  step  of  the  decoding  procedure  is  to  compute 
a l  for  1  <  %  <  v  (where  v  =  number  of  errors)  using  the 
equation 


v 


S.  +  /       S.  a      =0     for  0  <  i  <  2E-1 
l  /_ l-  9.    % 

1=1 
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from  the  syndromes  computed  in  the  previous  step.  This 
can  be  accomplished  using  Berlekamp's  iterative  algorithm  or 
Massey's  linear  feedback  shift  register  (LFSR)  synthesis 
algorithm.       [Ref.    12] 

Upon  obtaining    the    ot's,    the    third    step  of    the  decoding 
procedure    is    to    use    the   recursive   equation 


v 
:+i         / 


R.    =      /      E,      .       a     =  Q  for   2  E    <   i    <  N- 1 

}<:+  i         ^_ k+  l-  i    i 

1=1 


where 


Ek+i   =   Ek+i-N  for   k+i    >  N 

to   compute    the   remaining    Efc+ ^    for   2E    <    i    <  N-l. 

After  determining  the  transform  of  the  error  pattern 
E]^  j_  for  0  <  k+i  <  N-l,  by  equation  (8),  we  can  then  apply 
the    inverse    transform      to    E^  j_ ,    to   obtain    the      error    pattern 


r>2 


N-l 

0-(k+i)j 


e  .    =    ( N )          /  E,      . 

j  v     '         £_ k+ 1 

k+i=0 


for  j=0 ,1 ,2 , . . . ,N-1 .  Then  the  corrected  codeword  is 
obtained  by  subtracting  the  error  pattern  ej  from  the  stored 
code  vector  rj  in  the  buffer  register. 
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In  summary,  the  decoding   of  an   RS  code  is  composed  of 
the  following  five  steps: 

1)   Compute  the  syndrome  S^  using  the  equation 

N-l 


_      .  k+  i .    \     „ ( k+  i ) j 
Si  =  r( 3    )  =  /   r . ev    IJ 

j=0 


2)  Use  Berlekamp's  iterative  algorithm  or  Massey's 
LFSR  synthesis  algorithm  to  determine  the  coefficients 
of  the  error  locator  polynomial  aA  from  the  known  S-j_  = 
Ek+i    for    i=0  ,1 ,2  , .  .  .  ,2E-1  . 

3)  Compute  the  remaining  E^+ ^  from  the  known  S^  using  the 
equation 


v 


1=1 

for    2E    <    i    <    N-l  . 
4 )      Compute    the    inverse    transform 

N-l 


k+i=0 


to   obtain   the   error   pattern,  where   (N)~l   is   the 
inverse  of  N. 

5 )  Subtract  the  error  pattern  ej  from  the  received  code 
vector  rj  in  the  buffer  memory  to  obtain  the  corrected 
codeword . 

Note   that  in  steps   1,  4,   and  5,   the   processing  time 

is   proportional   to   N*J,   while   in   steps   2   and   3   the 
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processing  time  is  proportional  to  2E*J  and  (N-2E)*J, 
respectively.  Hence,  a  natural  partition  for  pipeline 
processing  is  to  divide  the  decoder  system  into  three  pipe- 
line stages.  Stage  1  is  used  to  perform  step  1,  stage  2  is 
used  to  perform  steps  2-4 ,  and  stage  3  is  used  to  perform 
step  5 .  To  obtain  a  uniform  throughput  of  one  decoded 
symbol  per  symbol  clock  cycle,  each  pipeline  stage  is 
required  to  complete  its  computations  in  N  symbol  clock 
cycles.  As  always,  the  throughput  of  the  system  is  deter- 
mined   by    the    slowest    stage    in    the   pipeline.       [Ref.    12] 

The  RS  decoder  architecture  using  the  above  pipeline 
decoding  technique  is  shown  in  Figure  6.12.  The  timing 
chart  of  the  decoder  is  shown  in  Figure  6.13.  In  both 
figures,  note  that  the  first  2E  input  symbols  of  the  inverse 
transform,  which  are  S0 ,  S]^ ,  ...,  S2E-1*  can  be  processed  in 
parallel  with  the  Berlekamp/Massey  LFSR  synthesis  algor- 
ithm. The  remaining  N-2E  input  symbols  of  the  inverse 
transform  are  obtained  from  the  remaining  transform.  Each 
of  these  N-2E  input  symbols  is  processed  by  the  inverse 
transform  circuit  immediately  after  its  generation.  In 
stage  3,  the  buffer  memory  is  read  out  symbol- by- symbol  and 
" Exclusive-OR'ed"  with  the  output  of  the  inverse  transform. 
A  triple-buffered  memory  is  required  to  store  the  three 
active   codewords    in    the   pipeline.       [Ref.    12] 
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VII.   CONCLUSION 

In  this  thesis  we  have  taken  a  modular  approach  to  the 
systolic  implementation  of  a  Reed- Solomon  encoder  and 
decoder.  By  initially  discussing  the  theory  behind  systolic 
arrays  and  finite  fields,  we  have  shown  how  they  play  an 
integral  part  in  the  overall  implementation.  The  binary 
case  is  presented  first  because  of  its  simple  architecture 
and  ease  of  understanding.  It  is  then  followed  by  a 
design  of  a  systolic  multiplier  and  an  RS  encoder  and 
decoder  . 

The  multiplier  requires  m  basic  cells  for  the  finite 
field  GF(2m).  Because  of  its  simple-control  methodology, 
regular  interconnection  pattern,  and  modular  structure  it  is 
highly  suited  for  VLSI  implementation.  The  encoder  using 
the  systolic  multiplier  offers  the  advantage  of  requiring 
less  power,  minimal  size,  and  high  reliability.  The  decoder 
being  modular  in  design  is  also  highly  suited  for  a  systolic 
architecture,  thus  the  decoding  speed  can  easily  be 
increased  by  using  a  distributive  processing  scheme.  In 
this  way,  several  decoders  can  operate  in  parallel  simul- 
taneously, while  each  individual  decoder  can  operate  in  a 
pipeline  fashion. 

The  design  of  both  the  RS  encoder  and  decoder  is  simple 
and  regular.   They  can  be  constructed  using  a  systolic  array 
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of  identical  cells  with  every  interconnection  path  occurring 
between  adjacent  cells.  This  makes  implementation  in  VLSI 
extremely  attractive  since  the  layout  of  the  cell  need  only 
be  done  once  and  then  replicated. 

It  is  hoped  that  with  this  thesis  as  a  guide,  an 
interested  electrical  engineering  student  could  implement 
the  encoder  or  decoder  in  hardware.  By  building  the  four 
cell-binary  encoder  first,  the  student  would  establish  a 
firm  foundation  vital  to  the  development  of  the  more 
complicated  RS  encoder.  This  process  could  then  be  expanded 
to  produce  an  encoder  of  eight  or  sixteen  cells,  or  the  more 
general  case  of  2m. 
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